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INTRODUCTION 


ith the Tandem Journal. we are 
pleased to offer you a unique 
source of technical information 
about Tandem products. The 
purpose of the Journal is to bring you the per- 
spectives of Tandem developers. engineers, 
and support analysts on Tandem hardware and 
software. Quarterly, the Journal will present 
in-depth articles on product research. design, 
and implementation. 
in this first issue, analysts in our Customer 
Application Support and Product Management 
groups bring you their insights into system 
performance and architecture, and the 
implementation of various software products. 
In future issues. Tandem hardware and soft- 
ware developers will also discuss their research 
and the design of Tandem’s newest products. 
Our goal is to provide you with interesting 
and useful information, unavailable else- 
where, that will help you to use your Tandem 
system most effectively. We welcome your 
comments and suggestions for tuture issues. 
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PATHFINDER— 
An Aid for 
Application 
Development 


Fe a 
ATHFINDER is an 
efficient new tool for 
the design of screens 
for PATHWAY™ appli- 
cations. It allows 
designers and users to 
define and create the 
screens together on- 
line, and to simulate the interaction of the 
screens before a single line of code is written. 
Created by Tandem analysts and used in 

the development of a variety of applications at 
Tandem, it is also available in the Inter- 
national Tandem User’s Group (ITUG) library. 
This article discusses how PATHFINDER 
solves one of the major problems of effective 
screen design and presents an overview of 
how it works. 


The Problem 
A recurring problem in the development 
of on-line computer applications is the 
lack of user input in the development process. 
Typically, hands-on involvement for users 
starts after the application has been designed 
and coded. At that time, the users invar- 
iably request changes, but the designers and 
programmers may resist making extensive 
modifications because of the significant effort 
required to rewrite the code. The changes 


are often compromised to such a degree that 
the resulting screens may not resemble what 
the users want at all. 

For successful application design, it is 
essential for designers to maintain a dialogue 
with the primary users of the proposed sys- 
tem so that the application requirements can 
be accurately defined. In addition to this, 
the users must be able to test the application's 
user interface before coding, so that they 
can see and feel how the finished product 
will operate. User dissatisfaction with an 
application can essentially be eliminated if the 
users are actively involved in the creation 
and testing of the application design. 

Currently, the most common method of 
demonstrating screen layouts and the logical 
flow of the application is to present the 
screens to the users on paper layouts. The 
users must then simulate the proposed appli- 
cation in their minds while shuffling through 
stacks of paper. Clearly, this technique does 
not represent effective design or simulation. 


The Solution 


ATHFINDER makes it easy for the 

users and designers to define the user inter- 
face to the system at the beginning of the 
application development cycle. It also allows 
them to create screens on-line and to dem- 
onstrate exactly how the screens will interact 
during the execution of the application. 
Since this is all done before any application 
code is written, modifications can be made 
quickly and easily. 
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The time line in Figure 1 shows how, with 
PATHFINDER, users are actively involved 
in both the requirement specifications phase 
and the screen design phase of the development 
project. 


How PATHFINDER Works 


AATHFINDER is a PATHWAY application 

program consisting of five requesters and 
two servers. It can be incorporated into an 
existing PATH WAY system (the recommended 
method) or run as a separate PATHWAY 
system. PATHFINDER has self-contained 
documentation, and each screen has sup- 
porting HELP screens to facilitate its use. 

PATHFINDER works on edit files, each file 

containing an. image of one of the applica- 
tion screens. Each edit file consists of a single 
screen of information, and all of them must 
reside on the same volume and subvolume. The 
designer uses PATHFINDER to build a rela- 
tional data base that links together the screen 
image files. The relational data base con- 
sists of a set of files that define the navigation 
logic among the screen images. PATHFINDER 
allows the designer to build and update the 
data base, edit files, and simulate user inter- 
action with the application. Figure 2 depicts 
the relationship of PATHFINDER to the Editor, 
the screen image files, and the navigation 
data base. 


Figure 1 
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Figure 1 

Application development 

time line when PATH- 
FINDER is used. PATH- 
FINDER contributes to 
successful application 

design by involving users 

in the screen design 

phase of the development. 
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Figure 2 
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Figure 2 
PATHFINDER allows the 
designer to create screen 
images in edit files, to 
define the navigation 
among the screens in a 
relational data base, and 
then to simulate screen 
navigation as it would 
occur in the finished 
application. 


Navigation Data Base 


he designer uses the ADD OR UPDATE 

DATA BASE facility of PATHFINDER 
to build the navigation data base. The inter- 
action between the user and the simulated 
application is defined exactly as the live 
application will be enacted. 

PATHFINDER presents a series of screens 
for defining or updating the actions asso- 
ciated with each application screen image. In 
most on-line PATHWAY applications, either 
function keys or entered data drive an applica- 
tion. To accommodate this, PATHFINDER 
allows actions to be assigned to any of the 32 
function keys. These actions include: 


= Moving to the next screen. 

= Remaining on the current screen. 

= Returning to a previously displayed screen. 
= Terminating the application. 


= Performing one of several possible actions 
on data that is entered with that function key. 


The last action, the MULTIPLE ALTER- 
NATIVE function, allows simulation of data 
entry. Instead of entering data, the user 
selects a description of the data that is to be 
entered. 

Construction of the navigation data base 
can be accomplished easily in a matter of 
hours. (For example, the navigation data base 
for the student registration system at Tandem’s 
Corporate Education Center was created 
and demonstrated in approximately three 
hours. The system consists of 54 screens.) 


Simulation can be performed on a navigation 
data base that is either partially or totally 
complete by entering the NAVIGATION OF 
APPLICATION SCREENS facility. During 
navigation, PATHFINDER behaves exactly as 
if it were the application; selection of a 
function key produces the appropriate appli- 
cation response. When there are multiple 
alternatives for a selected function key, PATH- 
FINDER displays a screen containing 
descriptions of the alternatives, and the user 
selects one. 


Beginning of Application Implementation 


nce PATHFINDER has been used to 

design and test the user interface with the 
planned application, the screens in the edit 
files can be used in the coding phase of appli- 
cation development. With minor modifica- 
tions, the edit files can be used as input to 
PATHAID, the PATHWAY screen builder, 
to generate SCREEN COBOL code for the 
requesters. 


Training Tool and Documentation Aid 
A s an additional benefit, PATHFINDER 


is an excellent training tool. It allows users 
to begin training on the simulated appli- 
cation as soon as screen design and testing are 
complete, long before the application is 
ready for production. 

Also, since the screen images are stored 

in edit files, they can easily be incorporated 
into user documentation as illustrations. 


Conclusion 


ATHFINDER provides users of Tandem 

NonStop and NonStop II systems with the 
ability to define and demonstrate proposed 
user interaction with an on-line application 
system at the beginning of the development 
cycle. Among its benefits are: 


= User involvement at the beginning of the 
development cycle, not at the end, as is 
customary. The application is up sooner with 
little or no user dissatisfaction. 


® Simulation of user interaction with the 
application. 


# User acceptance and enthusiasm for the 
finished product because of involvement in the 
design stage. 


= Input of the screen edit files to PATHAID 
to generate SCREEN COBOL code for the 
application. 


® User training on the simulated application 
during the development phase. 


® Screen illustrations for documentation. 


Sandy Benett has been a senior systems analyst in Marketing 
Technical Support's Program Development Group since joining 
Tandem in August, 1982. Currently, he is involved in enhancing 
and simplifying the application development cycle, working on 
PATHFINDER, and teaching PATHWAY. He has also been involved 
in the performance and tuning aspects of Tandem systems and 
the enhancement of user interaction with computer systems. Sandy 
was a systems analyst before joining Tandem, and also taught 
computer science at the junior college level. 
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The Performance 
Characteristics of 
Tandem NonStop 
Systems 


(Ee EE 
n 1982, the Tandem 
Design Support group 
ran a series of exper- 
iments designed to 
measure the perform- 
ance of NonStop™ and 
NonStop I/™ systems 
(in terms of transaction 
throughput under PATHWAY and TMF). The 
purpose of this article is to look at the 
results that were obtained and to draw some 
preliminary conclusions about what the 
observed performance characteristics mean. 


The Hardware and Software Configuration 


hown in Figure 1 is the hardware and 

software configuration used in the experi- 
ments. Each system had eight CPUs, eight 
primary paths to disc (one on each CPU), 
eight primary paths to X.25 high-speed 
lines (one on each CPU), and eight virtual 
circuits (terminals) per 
line. In each case, 


Our tests show a strict linearity the PATHWAY system 
in the relationship between consisted of: one 
CPU load and throughput. Terminal Control Pro- 


cess (TCP) in each 
CPU, controlling all the terminals on the line 
physically primaried to that CPU, and seven 
general-purpose servers in each CPU, all linked 
to the TCP in that CPU. The experimental 


systems were hard/soft configured in such a 
way that there was no TCP context swap- 
ping and no page faulting. Furthermore, the 
TCP user logic (procedure division) had 
merely to select the transaction types at the 
correct frequencies. 

The disc accessing was genuinely random 
across all eight paths. That is, although the 
initial processing of a given transaction 
(from the line handler to the TCP to the server) 
was handled entirely within a single CPU, 
disc activity for that transaction was multi- 
plexed across all CPUs at random. 

We provided enough servers (and memory) 
to ensure that 56 terminals could be in 
transaction state at the same time without 
forcing the TCP to wait for links to free up. 
This was definitely overkill because the 
probability that all terminals would be active 
at the same time at the defined arrival rates 
was very close to zero. In fact, even at the peak 
workload, there was a 95% certainty that a 
network of several hundred terminals generat- 
ing that workload could be handled with- 
out 56 of them being in transaction state at 
the same time. 

The priority scheme adopted for the various 
software components was as follows: disc 
process highest, line handler next highest, 
servers next highest, and TCPs lowest. 
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The Application 


A simple data base and three different 
transactions were defined. The relative 
arrival rate of the three transactions remained 
constant (at fixed percentages of the total 
load), and the incoming and outgoing message 
size was the same for each. However, the 
atomic CPU (or dedicated CPU cycles) cost 
for each differed, as did the number of 
logical disc I/Os and the amount of physical 
(elapsed or processor-overlappable) disc 

time required. 

Particular transaction types within any 
application have a profile that averages out 
over time, i.e. a certain number of com- 
munications I/Os of certain sizes, a certain 
amount of processing, and a certain amount 
of dise and cache I/O. (The nature of the pro- 
file depends on such things as line speed, 
line protocol, data base structure, number and 
type of file system calls, etc.) At various 
times during the day, the various transactions 
arrive at different rates. However, for any 
one of those time periods, it is possible to 
define an “average transaction” whose pro- 
file is made up of the profiles of its constituent 
transactions weighted by their arrival rates. 

Figure 2 shows how the resource consump- 
tion of a particular transaction can be sum- 
marized; Figure 3 shows how the properties 
of three transactions similar to those used 
in our experiment can be summarized in a 
single average transaction; and in Figure 4, 
that average transaction is diagrammed. Values 
used are for illustration only and are not 
necessarily those obtained in the experiments. 

The test application was developed and 
tuned on the NonStop IT system and then trans- 
ported to the NonStop system. Although 
two minor changes had to be made, it was 
possible to effect this transfer without recom- 
piling or modifying application control files. 


The Experiment 


A terminal simulator running on a separate 
system generated messages into the 
PATHWAY system described above. The 
arrival rate of incoming messages was known 
and could be varied. Each incoming mes- 
sage started a transaction, and a reply back 
out to the (simulated) terminal terminated 
the transaction. 


Figure 1 
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Figure 1 

Hard/soft configuration 
used in experiments 
designed to measure the 
performance of NonStop 
and NonStop H systems. 
Seven static servers per 
CPU were linked to the 
TCP on the same CPU; 
each server made disc pro- 
cess requests randomly 
across all eight CPUs. One 
TCP per CPU controlled 
the terminals physically 
connected to the same 
CPU. All CPUs were hard/ 
soft configured the 

same way. 


Figure 2 

The average resource 
consumption of a particu- 
lar transaction. Above: 
resource consumption in 


time sequence through 
the life of the transaction. 
Below: summary of 
resource consumption. 
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Figure 3 
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Figure 3 

Derivation of the aver- 
age transaction from the 
actual transactions. In 
the experiment, there were 
three different transac- 
tions; the relative arrival 
rate of the three remained 
constant within the 
overall arrival rate. In the 
experiment, the incom- 
ing and outgoing message 
size was the same for 


each transaction, but the 
atomic CPU cost for each 
differed, as did the 
number of logical disc 
I/Os, and thus the amount 
of elapsed physical disc 
time. (The minimum 
response time = pure line 
time + atomic CPU time 
+ elapsed disc time. The 
elapsed disc time ts time 
that can be overlapped 
with processing.) 


For each terminal, the full response-time 
distribution was maintained in the simulator. 
The response time was defined as the time 
between the first WRITE or WRITEREAD 
initiation of the transaction (equivalent to 
XMIT or first function key) and the final READ 
or WRITEREAD completion of the trans- 
action (equivalent to reply being displayed, 
and the keyboard unlocking). 

The terminal simulator imposed a variance 
on the selected arrival rate to replicate the 
real-world randomness of transaction initiation, 
while keeping the long-term average arrival 
rate at that selected. In other words, the 
selected arrival rate may have been 60 trans- 
actions per hour from a terminal. However, 
in any five-minute period, that terminal 
may have submitted several more (or less) than 
five transactions. 

Each simulated terminal had the same 
transaction-generation rate so that, over an 
extended period, the same amount of com- 
munications processing was handled by each 
CPU. Similarly, the disc accessing was 
spread equally across all CPUs (i.e., over an 
extended period, disc process activity in 
each CPU was the same). Thus, the eight-CPU 
system was well balanced, and, as explained 
above, it was well tuned (e.g., all unnecessary 
context swapping and page faulting were 
eliminated). These laboratory conditions were 
established and carefully controlled so that 
all tests would be repeatable. 


Results 


he detailed results of this experiment 

have been reported in Tandem internal 
publications. Overall, as shown in Figure 5, 
we observed an approximate 22% improvement 
in throughput with the NonStop II system 
(as compared to the NonStop system) given 
by the relative slopes of the graphs of 
throughput versus CPU consumption. Raw 
response-time data are summarized in 
Figure 6; sample simulator response-time 
probability distributions for the average 
transaction at three different system loads (on 
the NonStop II system) are given in Figure 7; 
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and sample response-time probability dis- 
tributions for the three transaction types at 
the same system load (on the NonStop IT) 
are given in Figure 8. 

In Table 1 and Figure 5, CPU utilization 
means total XRAY-measured CPU utilization 
(divided by eight), and the transaction rate 
is that of the total (eight-CPU) system. 
Response times (shown in Table 1 and Figure 6) 
were also given by XRAY. These response 
times do not include line time. 


Conclusions 


A s mentioned above, the purpose of this 
article is to look at the results that were 
obtained for the NonStop IT system (see 
Table 1 and Figures 4 through 7) and to draw 
some preliminary conclusions about what the 
observed performance characteristics mean. 

The most fascinating aspect of the results 
shown in Figure 5 is the strict linearity of the 
relationship between CPU load and through- 
put. It would appear that in Tandem systems 
the atomic CPU cost per transaction stays 
the same at loads ranging from as low as 35% 
up to as high as 76%. 

This would indicate that under controlled 
loads, (loads for which the various queues 
in the system do not exceed the memory 
assigned to hold them), the GUARDIAN™ 
operating system is extremely efficient in 
maintaining the real-time, multiprogramming, 
multiprocessing environment. It also indi- 
cates that this low-level operating system cost 
is small compared to the application-level 
costs (PATHWAY, TMF) per transaction, as 
one might expect. 

Other experiments need to be conducted to 
confirm that the linear relationship (i.e., 
throughput proportional to CPU consumption) 
applies from two CPUs up through sixteen. 
Several less rigorous experiments and bench- 
marks have already suggested that this is 
the case. 
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Total atomic processor resource 
consumption per average 
transaction = 916 ms. 


Total physical disc elapsed time 
(not including disc process time) 
per average transaction = 150 ms. 
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Figure 4 

Graphic representation 

of the average transaction. 
Four random logical 

disc accesses per average 
transaction were multi- 
plexed equally (over time) 
across all eight CPUs. 
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Figure 6 
Measured response times 
for the NonStop II sys- 
tem were very close to the 
curve obtained by apply- 
ing simple queuing theory 
to “average transactions.” 
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Table 1. 


Experimental results for the NonStop II system. 


CPU Transaction rate Average transaction 
utilization (transactions/ response time 

(%) second) (seconds) 

35.90 2.66 1.25 

39.63 2.98 1.28 

51.00 3.95 1.53 

61.00 4.80 1.70 

69.90 5.57 2.05 

76.34 6.12 2.31 


The fact that the plot of CPU load versus 
response time (Figure 6) has an easily rec- 
ognizable form indicates that the response 
times themselves are predictable. Other 
properties revealed in Figures 6, 7, and 8 are 
as follows: 


= The greater the resource consumption of 
a transaction, the greater its minimum response 
time (see Transaction 3 in Figure 8). 


= The greater the resource consumption of 
a transaction, the greater the queuing delay 
experienced by that transaction under load 
(see Transaction 3 in Figure 8). 


= The relationship between system load and 
the average response time for a transaction is 
not linear. For example, if a transaction 
averages a response time of one second at 45% 
load, it is unlikely to attain a response time 

of two seconds at 90% load on the same system. 


® Response-time distributions are usually 
skewed. The average response time is usually 
greater than the most probable response 
time, and the 95% and 99% certainty response 
times are several times the best-case response 
times. 


If it becomes realistic to describe appli- 
cations in terms of average transaction profiles 
and specified arrival rates, perhaps it is also 
realistic to model such applications assuming 
laboratory conditions (i.e., that communica- 
tions, disc, and processing loads are spread 
equally across all hard components and 
that the soft configuration avoids all unnec- 
essary thrashing or processing, as in our 
experiment). Naturally, the throughput/ 
response times predicted by such a model 
would have to be viewed as a target to aim 
for, not as a guarantee, because anything less 
than perfect balance and tuning would 
result in inferior performance. 

Such a model would be easy to use. The 
only input required would be the application 
transaction definitions, the expected trans- 
action rates, and the required response-time 
distributions. The output would be a sum- 
mary of the hardware required to meet the 
user requirements (assuming good hardware/ 
software design). 

A second conclusion to be drawn from 
the experiments is suggested by the fact that, 
if we extend the straight line formed by the 
data points in Figure 5, it cuts the CPU utili- 
zation axis at about 5%. We assume that 
this number represents the XRAY and statistics- 
gathering software overhead (i.e., that in 
this particular case the GO-interval statistics- 
gathering parameters were such that the 
overall cost over an extended period was 5%). 
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Note that the XRAY cost does not vary 
with throughput. The GO interval stays the 
same whether the system is operating with 
a 10% or an 80% load. 

It is not possible to balance or tune a system 
without XRAY, and although it is only nec- 
essary to run XRAY at certain times in the 
live system, most often, those will be peak- 
load times. Thus, the cost of running XRAY 
cannot be ignored. However, the experi- 
ment showed that the cost of an extremely 
detailed XRAY analysis of the running sys- 
tem, supplying all the data required to tune 
and balance, was minimal (and predictable). 


Summary 


Cie: has produced a hardware and 
software architecture that allows real- 
time business problems to be solved with 
real-time applications quickly, and practically. 
Experiments seem to indicate that given 

an application’s transaction profiles and pro- 
jected loads, Tandem will be able to predict 
hardware requirements for user-selected 
performance criteria. 


John Day is a member of Product Management's Performance 
Group. Since joining Tandem early in 1978, John has held various 
corporate and field positions in systems recovery, network design 
and systems modeling. Before joining Tandem, John was an analyst 
in the on-line data processing field for another major main frame 
manufacturer. He received an Hons. degree in Mathematics 

from the University of Manchester, England in 1965. 
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Figure 7 

Sample simulator response- 
time probability distri- 
butions for the average 
transaction at three differ- 
ent system loads (on 

the NonStop II system). 
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Figure 8 
Sample response-time 
probability distributions 
for the three transac- 
tion types at the same 
system load (on the 
NonStop II}. The total 
area under each curve 
is the same and represents 
a probability of 1 (v.e., 
certainty). The average 
response time fora 
transaction is given by: 
Average response time 
= 2% (probability of 
R.T+#R.T) where R.T. 
= average response time. 
Since these response-time 
curves are nearly always 
skewed, the average is not 
normally at the 50% 
cumulative probability 
point. The response time 
at which the cumulative 
probability for a transac- 
tion is equal to 0.95 gives 
the 95% certainty point 
for that transaction. The 
shape of the curves is 
dependent on the load on 
the system; the lighter 
the load, the taller and 
slimmer the curve; the 
heavier the load, the flat- 
ter the curve (and the 
longer the response times). 
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TMF and the 
Multi-Threaded 
Requester 


LEE eee eee 
his article describes the 
major considerations 
involved in writing 
a TAL multi-threaded 
requester that utilizes 
the Transaction 
Monitoring Facility 
(TMF). It assumes that 
the requester is to run as part of a process 
pair (a primary and its backup). The concepts 
discussed have already been addressed by 
Tandem and are incorporated into existing 
system software (namely, the PATHWAY 
Terminal Control Process). Therefore, this 
discussion is intended for analysts and 
system programmers who write or work with 
specialized system-level software rather 
than application programs. 

The major considerations addressed include: 


= The correct placement of CHECKPOINTs 
with regard to the TMF procedures BEGIN- 
TRANSACTION and ENDTRANSACTION. 


# Checkpointing TFILE information. 


= The establishment of restart points for 
each individual thread. 


# The effect of backup creation on the TFILE. 


= The need for an active backup process. 


CHECKPOINT Placement 


TMF transaction is a multistep operation 

that will transform a data base from 
one consistent state to another. The TMF 
transaction is initiated by the application 
process when it calls the procedure BEGIN- 
TRANSACTION. The application process 
terminates the transaction with a call to either 
ENDTRANSACTION or ABORTTRANS- 
ACTION. If a failure of the application pro- 
cess calling BEGINTRANSACTION occurs 
before the call to ENDTRANSACTION, TMF 
will completely back out all audited data 
base changes made on behalf of that TRANSID. 
In the case of the multi-threaded requester, 
it is possible that multiple transactions were 
in progress at the time of the failure. TMF 
will back out each of those transactions. 

Because the multi-threaded requester may 
issue multiple BEGINTRANSACTION calls 
before an intervening ABORTTRANSAC- 
TION or ENDTRANSACTION, the transac- 
tion pseudofile (TFILE) must be used to 
keep track of the various TRANSIDs involved. 
A process can open the TFILE, which is 
not a physical file, by opening the Transaction 
Monitor Process. 

The TFILE provides a mechanism through 
which the requester can manipulate its cur- 
rent TRANSID and maintain a history of each 
TRANSID’s completion status (for use by 
the backup requester in the event of a failure). 
When the requester successfully issues a 
call to BEGINTRANSACTION, a tag identi- 
fying the new TRANSID is returned to the 
requester process, and a new TRANSID is 
placed in the TFILE of the primary requester 
process. When the requester successfully 


calls ENDTRANSACTION or ABORT- 
TRANSACTION, the entry in the TFILE for 
that TRANSID is removed. Therefore, it 

is necessary to CHECKPOINT the TFILE to 
the backup process at certain strategic 
points. The tag returned by BEGINTRANS- 
ACTION allows the process to change its 
current TRANSID by passing the tag to the 
procedure RESUMETRANSACTION. 


As mentioned, a call to BEGINTRANSAC- 


TION will place a new TRANSID in the 
TFILE of the primary process, and a call to 
ENDTRANSACTION will remove the 
TRANSID. When the TFILE is CHECK- 
POINTed after a new TRANSID has been 
created, the TRANSID will be placed in the 
backup’s TFILE. A CHECKPOINT of the 
TFILE after the TRANSID has ended or 
aborted will cause the TRANSID to be 
removed from the backup’s TFILE. 

In the case of a multi-threaded requester 
(with a backup) that does not utilize TMF, 
CHECKPOINTs are placed before and after 
each WRITEREAD to a server. In the 
event of a failure, the backup requester must 
take over and continue the transaction until 
it completes. With TMF, however, if a failure 
occurs during a transaction, TMF will back 
out all audited changes made to the data base. 
Therefore, the backup requester is not 
concerned with continuing a transaction, but 
rather with restarting a transaction that 
has been backed out. In this case, the CHECK- 
POINT will be placed near the calls to 
BEGINTRANSACTION and ENDTRANS- 
ACTION or ABORTTRANSACTION. 

The following examples illustrate the incor- 
rect and correct placement of CHECKPOINTs 
in relation to calls to BEGINTRANS- 
ACTION and ENDTRANSACTION. Consider 
first the example of incorrect CHECK- 
POINT placement shown in Figure 1. 

When the failure occurs, the backup process 
will take over at the last CHECKPOINT. 
The backup process's TFILE will have an 
entry for the TRANSID because of the 
CHECKPOINT of the TFILE. However, the 
transaction identified by this TRANSID 
has been backed out by TME. The program 
will issue the SEND to the server but there 
is no current TRANSID. Therefore, the actions 
by the server will fail, the transaction will 
be lost (because the program will prompt the 
terminal for the next transaction), and the 
TRANSID will still occupy space in the TFILE. 


Figure 2 shows the correct placement of 
the CHECKPOINT, just before the call to 
BEGINTRANSACTION. If a failure occurs 
at point 1, TMF will back out all audited 
changes to the data base made by the server, 
and the backup requester will take over at 
the CHECKPOINT. Since the TFILE was 
CHECKPOINTed before the call to BEGIN- 
TRANSACTION, the backup doesn’t know 
about the transaction and can take the original 
transaction data and successfully call 
BEGINTRANSACTION to re-initiate the 


transaction. 
ee ee 
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Another example of the incorrect place- 
ment of a CHECKPOINT is shown in Figure 
3. If a failure occurs between ENDTRANS- 
ACTION and the CHECKPOINT, TMF will 
not back out the transaction because the 
call to ENDTRANSACTION has already been 
performed. The backup process will take 
over at the previously executed CHECK- 
POINT. A new TRANSID will be created by 
BEGINTRANSACTION, and the intended 
transaction will be executed twice. Also, the 
new primary’s TFILE will still contain the 
old TRANSID because the CHECKPOINT 
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after the call to ENDTRANSACTION was 
not performed by the failed primary requester. 

Figure 4 illustrates the correct placement 
of the CHECKPOINT before the call to END- 
TRANSACTION. Consider the actions the 
backup must perform upon take-over at 
CHECKPOINT A and CHECKPOINT B. 

If a failure occurs before CHECKPOINT 

B, the backup will take over at CHECK- 
POINT A. TMF will have backed out the 
transaction that was in progress when the 
failure occurred, and the backup must restore 
the transaction context. BEGINTRANS- 
ACTION will then be called to re-initiate the 
transaction. 

If a failure occurs after CHECKPOINT B 
but before CHECKPOINT A, the backup 
process must determine if the previous trans- 
action has completed, i.e., whether the 
failure occurred at point 2 or at point 3. If 
the failure was at point 2, the transaction 
has not completed, and the backup must 
restore the original transaction context so that 
the transaction can be re-initiated. If the 
failure occurred at point 3, the transaction 
has completed, and the backup can then 
prepare for a new transaction. The process 
must always call RESUMETRANSACTION 
after completing CHECKPOINT B to deter- 
mine whether a failure has occurred and where 
that failure occurred. If an error 90 is returned 
by RESUMETRANSACTION, we know 
that a failure of the primary process occurred 
while a transaction was in progress. We 
can then set up to restart the transaction. 

A failure at point 3 will cause the 
backup process to take over at CHECKPOINT 
B, but in this case the transaction has 


TRANSACTION. TERMINAL 1/0 completed. The new primary requester pro- 
. cess (i.e., the former backup) will issue a 
UNCLUDING eek call to RESUMETRANSACTION, which will 
Senin HRURABTION return an error not equal to 90. It will then 
. call ENDTRANSACTION (to clear out the 
: TFILE contents of the transaction that just 
HUE FAALOREA completed), and the requester will then con- 
WRITEREAD TO SERVER tinue processing on behalf of this thread. 
: Consider next a failure at point 1. TMF 
CHECKEOINT . will back out the transaction, and the backup 
will take over at CHECKPOINT A. At this 
MMU FANLURE 2 point, the backup TFILE does not know about 
ENDTRANSACTION the TRANSID created by the primary 
MMI, FAVWASRE 3 process. Thus, BEGINTRANSACTION 
aA will be called, anew TRANSID will be created, 
and the transaction will proceed normally. 
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If a failure occurs at point 2, again TMF will 
back out the transaction, and the backup 
process will take over at CHECKPOINT B. 
The requester will then call RESUME- 
TRANSACTION, which will return an error 90, 
indicating that the transaction did not com- 
plete due to a failure of the primary process. 
The process must then call ABORTTRANS- 
ACTION to clear out the TFILE, restore 
the transaction context data, and return 
control back to BEGINTRANSACTION. 


CHECKPOINTing TFILE Information 


n this section we will examine the correct 
handling of TFILE sync information in 
CHECKPOINTs to the backup process. 

The CHECKPOINTing of the TFILE is a 
very special situation, as illustrated in Figure 5. 
Whenever a program issues a call to CHECK- 
POINT, the information in the TFILE rela- 
tive to the current TRANSID is sent to the 
backup. The checkpointing facility auto- 
matically performs a call to GETSYNCINFO 
for the primary’s TFILE and a call to SET- 
SYNCINFO for the backup’s TFILE. 

However, in the case of a multi-threaded 
requester, each time the requester is ready to 
service a thread, a call to RESUMETRANS- 
ACTION is performed to set the current 
TRANSID. Therefore, any CHECKPOINTs 
performed during the processing of the 
thread will CHECKPOINT the TFILE con- 
tents for that TRANSID, not the TFILE 
contents for the TRANSID that previously 
initiated or terminated a transaction. This 
means that the backup’s TFILE for the pre- 
vious TRANSID will not be up to date 
with the primary’s TFILE, and if a failure 
occurs, the TFILE in the backup could 
potentially contain entries for transactions that 
were completed by the primary before the 
failure. The backup (now the primary) could 
then experience a BEGINTRANSACTION 
error 83 (process has begun more transactions 
than can be handled) because there would 
be no space available in the TFILE to record 
anew TRANSID. 

The solution to this problem, shown in 
Figure 6, is to have each thread call GET- 
SYNCINFO (on the TFILE) after calls to 
BEGINTRANSACTION, ENDTRANSAC- 
TION, or ABORTTRANSACTION, but before 
the next call to RESUMETRANSACTION. 
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Each thread must save its own TFILE infor- 
mation. Then, before any CHECKPOINT 
issued on behalf of that thread, a call to SET- 
SYNCINFO must be performed to reinitialize 
the TFILE to the information that was 
saved by the above-mentioned call to GET- 
SYNCINFO. The CHECKPOINT will then 
correctly send the information to the backup, 
and the backup’s TFILE will be up to date 
with the primary’s TFILE. 


Establishing Restart Points for Each Thread 


A s the requester process is executing, 
each individual thread will be in a different 
state. For example, one thread might be 
waiting for a server reply, while another 
thread might be waiting for terminal I/O com- 
pletion. The requester process must identify, 
for each individual thread, valid states that 
can be used to restart the threads in the event 
of a failure, and the backup process must 
receive this information. It is possible, when 


CHECKPOINTs. TERMINAL 1/0 a restart point is reached for a thread, that 
. no backup process exists. Yet, when the backup 
| WRITETOSWAPFILE | is created, it needs to know each thread's 
CHECKPOINT (INCLUDING restart point. This requires that the requester 
TFILE) A 
safe-store the context area for each thread. 
sisncal pene The context data that is safe-stored by 
SET STEN the primary could be located in the primary’s 
: data area and CHECKPOINTed to the 
WRITEREAD TO SERVER backup. However, the data area is of finite 
: size, and such a strategy could limit the 
number of threads the requester could service. 
SHBEKROINT ENOLUDING As an alternative, the primary requester 
TFILE)B process can safe-store context data for each 
ENDTRANSACTION thread in a disc file (swap file). Upon take- 
j over, the backup could restart each thread 
from the last known restart point by read- 
ing the context data from the swap file. 
The following examples illustrate the correct 
placement of the WRITE to the swap file 
in relation to the calls to CHECKPOINT. 
Consider first Figure 7, in which the WRITE 
to the swap file is placed after the call to 
CHECKPOINT. If a failure occurs at point 1, 
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the primary process will have completed 

the terminal I/O, but that context information 
will not yet be written to the swap file. 

The backup will takeover at CHECKPOINT A 
and attempt to start a transaction. However, 
because the primary process has not informed 
the backup of the new context for this 
thread, the original transaction will be lost. 

By placing the WRITE to the swap file before 
the call to CHECKPOINT as in Figure 8, 
you can ensure that the primary requester will 
have recorded the current thread’s context 
in the swap file. The first WRITE to the swap 
file will record the results of the terminal 
I/O, and the second WRITE to the swap file 
will record the results of the server I/O. 
Now if a failure occurs at point 1, the backup 
process can take over, restore the context 
for the thread, and successfully begin the 
new transaction. 

Figure 9 illustrates a problem that could 
occur with the context area of the swap file. 
If a failure occurs at point 1, the WRITE 
to the swap file at B will have overwritten 
the context data written to the swap file 
at A. TMF will back out the transaction, and the 
backup will take over at CHECKPOINT A 
and read the context area from the WRITE to 
the swap file at B. However, the SEND to 
the server has caused the original context area 
to be altered. It is this altered version of 
the context that was written to the swap file 
at B. When the backup requester takes over, 
it will resume execution at A, but with the 
context as it appeared at B. This incon- 
sistency may cause the restarted transaction 
to behave differently. 

Figure 10 provides a solution to the above 
problem, which involves the establishment 
(within the swap file) of two areas for context 
data for each thread. Failure | will cause 
the backup to take over at CHECKPOINT A 
after TMF has backed out the transaction. 
Because the location of the context area has 
been CHECKPOINTed, the backup can 
now read the context that was written to the 
swap file at A and restart the transaction 
normally. 
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If a failure occurs after the ENDTRANS- 
ACTION, the backup can take over at 
CHECKPOINT B (the transaction has com- 
pleted and been committed by TMF) and 
restore the context area from the WRITE to the 
swap file B. The requester can then pre- 
pare for a new transaction from the thread. 

If the requester fails at point 2, the trans- 
action will be backed out, and the backup 
will take over at CHECKPOINT B. RESUME- 
TRANSACTION will be called, and an error 
90 will be returned. The process must then 
call ABORTTRANSACTION to clear the 
TFILE, restore the transaction context data 
from the WRITE to the swap file at A, and 
return control to BEGINTRANSACTION. 

One last concern related to the swap file 
has to do with synchronization information. 
The requester must open the swap file 
with a sync depth of 1 (so that the file system 
and disc process will ensure that the data 
from the WRITE request will, in fact, be written 
to the swap file), and the swap file’s syn- 
chronization information may be included in 
the CHECKPOINT. 

In Figure 11, just before the failure, the 
second WRITE to the swap file B has been com- 
pleted, but the new sync ID has not been 
CHECK POINTed to the backup. After the 
transaction has been backed out, the backup 
will take over at CHECKPOINT A. A new 
transaction will then be started. If another 
process has changed records to be accessed 
by this transaction, the context data from the 
WRITE to the swap file B will be different 
from that CHECKPOINTed by the old primary 
process before it failed. However, when the 
WRITE to the swap file B is executed, the 
file system will increment the sync ID (which 
is sync ID n from the last CHECKPOINT) 
to syne ID n+1, and the disc process will 
assume that this is a duplicate request. Thus, 
the WRITE will not be performed. This 
will yield incorrect context data for the 
thread. For this reason, the primary pro- 
cess should not CHECKPOINT swap file syne 
information to the backup process. 
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The Effect of Backup Creation on the TFILE —‘ The Need for an Active Backup 


he call to CHECKPOINT is a WAITed 
backup, one of the functions that must operation. This means that the process 
be performed is the synchronization of the issuing the CHECKPOINT is suspended 
TFILE for the backup. This involves issuing a _ until the completion of that operation. 
call to RESUMETRANSACTION followed The multi-threaded 
by the CHECKPOINT of the TFILE for each ~—srequester showid ty ———— 
thread that the requester can service. One suspend activity The primary process should not 
major concern is the synchronization of the for an individual create a backup if any of its threads 


TFILE in the event of a failure and sub- thread when that isin ENDTRANSACTION state. 


hen the primary requester creates its 


sequent backup creation. It should be noted 
that if a process is using the TFILE, the 

call to ENDTRANSACTION is a no-waited 
operation, which must be completed with 

a call to AWAITIO. The time between calls to 
ENDTRANSACTION and AWAITIO is 
termed ENDTRANSACTION state. The pri- 
mary process must not attempt to create a 
backup and synchronize the TFILE if any 
thread is in ENDTRANSACTION state. 

Whenever ENDTRANSACTION is called 
by a process, information is sent to each 
CPU, which will indicate the change of state 
for the currently ending transaction. The 
BUSRECEIVE interrupt handler in each CPU 
takes this information and places it in the 
backup’s TFILE if it exists in that CPU. 

A problem could arise if, in the situation 
illustrated in Figure 12, the primary process 
attempts to create a backup at point 1. 

Two operations are occurring asynchronously. 
First, the state change from the call to 
ENDTRANSACTION is being processed, and 
second, the primary process is attempting 

to synchronize the backup’s TFILE. If the pri- 
mary process has not had time to CHECK- 
OPEN the backup’s TFILE before the change- 
of-state information is processed, that 
information will be thrown away, and the 
backup’s TFILE will not be in syne with 

the primary’s TFILE. The solution to this is 
to make sure that the primary process does 
not create a backup if any one of its threads 
is in ENDTRANSACTION state. 

Also, if it is ready to create a backup, the 
primary process should not put any new 
threads into ENDTRANSACTION state, until 
all previous threads have completed END- 
TRANSACTION, and a backup process has 
been created. 


thread is willing to 

suspend; the process itself should not suspend 
completely (this will suspend all threads) 
unless it has no more work to complete. Also, 
as the number of threads the requester is 
servicing increases, the number of CHECK- 
POINTs increases, causing the requester to 
suspend itself more often. This will signifi- 
cantly retard processing by the requester. 
The solution is to write an active backup pro- 
cess allowing the primary process to issue 
NOWAITed WRITEREADs to the backup 
process, suspend only that thread, and 
continue executing. 
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NonStop II 
Memory 


Organization 
and Extended 
Addressing 


eS 
mae till maintaining a large 
degree of compati- 
bility with the Tandem 
. NonStop system, 
memory addressing 
and access on the 

7 NonStop II system have 
. been extended to 
provide a logical address space of one giga- 
byte per processor, supported by a physical 
memory of up to 16 megabytes. This article 
describes the forms of logical addresses used 
by programs and how they are transformed 
into physical addresses in a processor. 


Address Translation Requirements 


ll instructions, except those that initiate 
processor cold load or reload, reside 
in memory. Consequently, any meaningful 
instruction execution requires the processor 
to translate logical addresses into physical 
addresses, both to locate and interpret the 
instructions and to operate on their data. 
The processor performs this translation using 
internal registers and tables that define 
the correspondence between logical and 
physical memory addresses. 


Definitions 


RY the purposes of this article, we will 
adopt the following definitions: 


Physical Memory: Electronic circuitry which 
stores information. It is organized into words 
of 16 bits each, and only whole words can 

be read from or written to physical memory. 
Each processor has from 256 thousand to 
four million words of physical memory. It is 
private to the processor and not shared 

with any other processor in the system. A word 
is stored in or retrieved from physical mem- 
ory when the control circuitry is presented with 
a 23-bit address identifying the word to be 
accessed. Access to individual bytes requires 
that software or firmware perform the oper- 
ations necessary to extract or insert bytes 

in words. 


Physical Memory Address (or “physical 
address”): A 23-bit number identifying a word 
of physical memory. 


Logical Memory: Memory as perceived by 

a program. Depending on the instruction being 
executed, a program views logical memory 

as either words or bytes, having either 16- or 
32-bit addresses. By utilizing information 
stored in registers dedicated to the purpose 
(map registers) and algorithms embodied 

in microcode, the memory references required 
for executing an instruction are transformed 
from logical addresses to physical addresses. 
The total amount of logical memory avail- 
able to be shared by all the programs in a 
processor is 1,073,741,824 bytes (one gigabyte). 
This limit is independent of the amount of 
physical memory in the processor. 
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Logical Memory Address (or “logical address”): 
A 16- or 32-bit number used within a pro- 
gram to identify a word or byte of memory. 


Virtual Memory: A technique for permitting 
the amount of logical memory currently 
allocated and in use to exceed the amount of 
physical memory in the processor. Images 

of the contents of logical memory are usually 
maintained in disc storage and brought into 
physical memory as required by program 
execution. Information no longer required for 
a program’s execution may be returned to 
disc so that the physical memory can be reused. 


Page: A block of memory 1024 words in 
length, beginning at an address that is a multiple 
of 1024 words. 


Organization of Logical Memory 


program views memory as consisting of 

segments within which its instructions 
and data reside. It uses logical addresses to 
identify segments and words or bytes within 
them. Different types of segments and corre- 
spondingly different rules of access make 
it possible to share or not share segments and 
provide different levels of independence 
between the program and the memory it uses. 
There are three different, but related, types 
of segments. 


Absolute Segments 

Logical memory consists of up to 8192 absolute 
segments. Each segment consists of up to 

64 pages. Absolute segments are numbered 
from 0 to 8191, and within a processor there 
can be only one absolute segment having 

a given number. Thus, the 8192 segments con- 
stitute a resource that must be shared by 

all programs (processes and interrupt handlers) 
that can potentially be in execution within 
the processor. Some absolute segments are 
allocated by SYSGEN to the operating 
system. The remainder are managed by the 
operating system both for its own use and 

to satisfy the memory needs of application 
processes. GUARDIAN sets the length 

(in pages) of segments according to the needs 
of the programs using them, and the length 

of an absolute segment will change from time 
to time. Unused absolute segments have a 


length of zero. The pages in a segment are 
numbered sequentially starting with 0 and 
continuing up to the last page in the segment. 
The number of a page within a segment is 
called the relative page number, since it is 
relative to the beginning of the segment. 
Only a privileged 
program can address 
memory directly 
as an absolute seg- 
ment. To do so it 
uses an absolute 
extended address. An absolute extended 
address consists of a 32-bit doubleword having 
the format illustrated in Figure 1. It is often 
convenient to think of bits 15 through 31 as a 
17-bit byte address within the segment. 


The NonStop II system provides 
logical address space of one giga- 
byte per processor. 


Relative Segments 

If all programs always used absolute extended 
addresses for all their memory references, 
the instructions making up a program would 
vary depending on which absolute segments 
contained its code and data. This situa- 
tion, similar to that which prevailed on most 
computers in the 1950’s, can be improved 
greatly by introducing a scheme of automatic 
“address relocation.” Address relocation 
permits all programs to be written as if the code 
and data reside at the same conventional 
logical addresses. When the program is 
executed, these relative addresses are automat- 
ically translated, by the computer, to the 
absolute addresses actually being used. 


Se 
Figure 1 Figure 1 
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Figure 2 
Relative extended address. 


Thus, on the NonStop II system, programs 
can view memory as relative segments. 
This feature permits each program to begin 
numbering its segments with segment number 
0 and to utilize the same addresses regard- 
less of which absolute segments the program 
is actually using. The maximum permitted 
relative segment number is 1027. The operat- 
ing system assigns absolute segments to 
the program for its use, and the Tandem 
NonStop II hardware and microcode convert 
the relative memory references to the form 
needed to access the assigned absolute seg- 


Figure 2 

Bits Contents 

0 0 (absolute bit) 

1 0 (reserved) 

2-14 Relative segment number 

15-20 Relative page number 

21-30 Word number 

31 Byte number 
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ments. Both privileged and non-privileged 
programs may use relative addresses. 

Relative segments 0, 1, 2, and 3 always 
refer to particular absolute segments accessible 
to a program. They correspond to the pro- 
gram’s current data, system data, current code, 
and user code segments. The size and 
boundaries of these relative segments coincide 
with the size and boundaries of the corre- 
sponding absolute segments. 

Relative segments 4 through 1027 have a 
different relationship to the underlying abso- 
lute segments. Most of the time they will 
correspond one to one with absolute segments, 
but not always. The relocation mechanism 
permits a relative segment in this range to begin 
at any location within some absolute seg- 
ment, and it permits the highest-numbered 
relative segment used (greater than three) 


to end at any byte. Thus, that highest num- 
bered relative segment need not contain 

an integral number of pages. These segments 
constitute the program’s current “extended 
data segment? which is discussed in the 
next section. 

All relative segments can be addressed 
with 32-bit relative extended addresses. A rela- 
tive extended address is identical to an 
absolute extended address, except that bit 0 
contains 0 and the segment number field 
contains a relative segment number instead of 
an absolute segment number. The format 
is illustrated in Figure 2. In this case also, bits 
15 through 31 can be thought of as a 17-bit 
byte address within the segment. 

As mentioned above, the first four relative 
segments (0, 1, 2, and 3) correspond to the 
program’s current data, system data, current 
code, and user code segments. A program 
can address these relative segments using 
either relative extended addresses or 16-bit 
addresses (which are also relative). The 
16-bit addresses are essentially the same as 
16-bit addresses on the Tandem NonStop 
system. The Instruction Processing Unit (IPU) 
interprets 16-bit addresses depending upon 
the address use (i.e., code access or data access), 
Environment Register bits, and instruction 
word. To the extent that the two systems have 
code compatibility, they interpret 16-bit 
addresses identically. 

Depending on the value of the PRIV, DS, 
CS, and LS bits in the ENV register, a pro- 
gram may be able to access up to four code 
segments (user code, user library, system 
code, and system code extension) and two data 
segments (user data and system data), but 
not all at the same time. No more than four 
of these segments can ever be accessible 
simultaneously, using either 16-bit addresses or 
relative extended addresses. 

The segments accessible via 16-bit addresses 
are selected as follows: 


® Current data (relative segment 0): Normal 
program data references are in this seg- 
ment. It is either user data or system data, 
depending on the DS bit in the ENV regis- 
ter (0: user data, 1: system data). In the 
NonStop II kernel, only interrupt handlers 
execute with the DS bit set to 1. 
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# System data (relative segment 1): Privileged 
programs can execute instructions that 
access system data regardless of the setting 
of the DS bit. Non-privileged programs 
executing the same instructions access current 
data (with no error indication). 


= Current code (relative segment 2): A pro- 
gram’s instructions always come from the 
current code segment, which may be user 
code, user library, system code, or system 
code extension, depending on the values of the 
CS and LS bits in the ENV register. The 
selection rule is: 


CS LS Code Segment Used 


0) 0 user code 

0 1 user library 

1 0 system code 

1 1 system code extension 


® User code (relative segment 3): In order to 
permit procedures executing in other code 
segments access to read-only arrays stored in 
the user code segment, certain non-privileged 
instructions are able to specify the user 
code segment as a data source, using 16-bit 
addresses. 


Note: If the DS bit is a 1, relative segments 
0 and 1 both refer to the same absolute 
segment (system data). If CS and LS are both 
0, relative segments 2 and 3 both refer to 
the same absolute segment (user code). 


Using relative extended addresses to access 
relative segments 0, 1, 2, and 3 is an alter- 
native to addressing the same memory loca- 
tions using 16-bit addresses. A 16-bit address 
is always faster. Often there is no compen- 
sating advantage to be gained by using the 
extended address. Occasionally, however, 
the features listed below make extended 
addressing preferable. 


® Uniform byte addressing over the whole 
segment (instead of the restricted byte address- 
ing available using 16-bit addresses, and 

the restricted range of SG mode addressing). 


= Ability to use the entire set of extended 
address instructions with any segment. (Note, 
however, that an attempt to store into rela- 
tive segments 2 or 3 will fail with an instruction 
failure.) 


Neither 16-bit addresses nor relative 
extended addresses permit a program to read 
a code segment other than user code or 
the current code segment at any given instant; 
therefore, at least two (and often three) 
code segments are inaccessible via these types 
of address. All four code segments can be 
accessed at any given time, however, by using 
absolute extended addresses. 


Extended Data Segments 

Relative extended addresses in relative 
segments four and above are always within an 
extended data segment. An extended data 
segment is a contigu- 


memory having a With absolute extended addresses, 
length of fromito all four code segments can be 
134,217,728 bytes. accessed at any given time. 


Its boundaries need 

not be on either absolute segment or page 
boundaries. It may consist of part of an abso- 
lute segment or up to 1024 full absolute 
segments. If it contains more than one absolute 
segment, however, the absolute segments 
must be contiguous. 

A process may have many extended data 
segments defined and allocated at any instant, 
but it can access only one at a time, since 
the segment relocation mechanism provides 
for only one base address. Since there are 
only 8192 absolute data segments in a pro- 
cessor’s logical memory, a process that 
allocates several very large extended data seg- 
ments may reduce the number of absolute 
data segments available to other processes in 
the same processor to a level that inter- 
feres with their operation. Excessive use of 
large extended data segments can inter- 
fere with GUARDIAN operations in other 
ways as well. 
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GUARDIAN provides facilities allowing 
processes to allocate, deallocate, and identify 
extended data segments. It also provides 
facilities for switching from one to another. 
When a program uses a relative extended 
address having a value of %2000000 or 
more (segment four and above), it refers to 
the currently selected extended data seg- 
ment. Extended data segments can be accessed 
only via extended addresses; there is no 
way to access them via 16-bit addresses. 

The terminology can be somewhat confus- 
ing. We have defined three different kinds 
of segments. They are summarized below. 


1. Absolute Segment: A block of memory con- 
taining 0 to 64 pages. An absolute seg- 
ment always contains some integral number 
of whole pages. Each processor in a 
system can have up to 8192 absolute 
segments. 


2. Relative Segment: A block of memory 
containing | to 131,072 bytes. The use of 
relative segments permits programs to 
be written without concern for the identity 
or allocation of the absolute segments 
they utilize. Each process can access up to 
1028 relative segments at a given time. 
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. Extended Data Segment: A block of memory 
containing 1 to 134,217,728 bytes. An 
extended data segment begins at relative 
segment 4, byte 0, and extends through 
however many relative segments are needed 
to make up its total size. Extended data 
segments are composed of one or more rela- 
tive segments. They provide a way for a 
program to organize and manage large blocks 
of data storage. 


Translating Logical Addresses 
into Physical Addresses 


hen an IPU needs to read its next 
instruction or to read or write data in 
memory, it starts with a logical memory 
address which must be transformed into a 
physical memory address. The transforma- 
tion is performed in two stages. The IPU con- 
verts the logical address, 16-bit or 32-bit, 
to a mapped address. The memory mapping 
unit then converts the mapped address into 
a physical memory address. 


Memory Mapping 

All access to physical memory, both for the 
IPU and the I/O channel, is performed via 
memory mapping registers. The GUARDIAN 
Memory Manager (or SYSGEN) assigns 
pages of physical memory to an absolute seg- 
ment as needed for program execution. 
They may or may not be contiguous in physical 
memory, but by converting the relative page 
number (within a segment) to a physical page 
number (in physical memory), the mapping 
unit makes them appear, to a program, to be 
a contiguous block of memory. 

The memory mapping unit consists of 16 
maps. A map is a set of 64 registers capable of 
containing the 64 physical page numbers 
that would be required by a segment of maxi- 
mum size. The registers within a map are 
numbered from 0 to 63 and correspond to the 
relative page numbers within a segment. 
The maps, like the physical memory, deal only 
with word addresses, not bytes. The IPU, 
or the channel, presents the memory mapping 
circuitry with a four-bit map number and 
a 16-bit word address. The map number picks 
one of the sixteen maps. The leftmost six 
bits of the 16-bit address are used to select one 
of the 64 registers within the map. This 
register then supplies 13 bits (a physical page 
number) which are joined with the right- 
most ten bits of the original address to form 
the 23-bit physical address. 

Regardless of its original form, the logical 
address must, in the final stage of translation, 
go through this mapping step. The com- 
bination of the map number and 16-bit word 
address is a mapped address. When the physi- 
cal page numbers corresponding to a par- 
ticular absolute segment are contained in some 
map, the segment is said to be “mapped.” 
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The remainder of this article describes how 
the various forms of logical addresses are 
transformed into mapped addresses. 


Translating an Absolute Extended Address 

to a Mapped Address 

Since there are 8192 absolute segments, each 
of which could require a map to identify 

its physical pages, and there are only 16 maps, 
we must have ways of managing the use of 
maps, and we must be prepared for the case in 
which a segment is not mapped. In general, 
therefore, to translate an absolute extended 
address to a mapped address we must deter- 
mine whether the segment is mapped, and if so, 
by which map. If the page we wish to access 
is not mapped, then its absolute page number 
must be placed into a map register before 

the memory access can be completed. 

To keep track of whether each absolute 
segment is mapped, and if so by which map, 
each processor has a table of 8192 elements, 
one for each segment. This table, called 
the Segment Table, is stored in memory. To 
access other parts of memory, we must first 
be able to read the Segment Table. Thus, it 
must remain mapped at all times. For rea- 
sons explained later, the amount of informa- 
tion we need to keep about each segment 
is sufficient to fill two words. Therefore, the 
Segment Table is defined to consist of 8192 
doubleword entries, and can thus occupy up 
to 16 pages of memory. So that we can 
always reach it when we need to, 16 registers 
from map 14 are reserved for mapping this 
table. (Registers 28 through 43 in map 14 are 
used for this purpose.) 

If an absolute segment is mapped, its entry 
in the Segment Table will tell us which map 
to use. Thus, we can give the memory mapping 
unit the correct map number and word 
address. If it is not mapped, however, we need 
to do two things: 


1. Find that physical page that corresponds 
to the logical page we are trying to access. 


2. Find at least one map register in which we 
can load this physical page number in order 
to execute the mapped memory access. 


To satisfy our first need, we could have 
made each entry in the Segment Table long 
enough to store 64 physical page numbers 
so that we would have the information we need 


immediately at hand. Since many absolute 
segments are quite small, this solution would 
entail inefficient use of both physical stor- 
age and map registers. Instead, the Segment 
Table contains the address (in some other 
part of memory) where this information is to 
be found. Each absolute segment of length 
greater than zero has, somewhere in memory, 
a Page Table which contains the physical 
page number corresponding to each of its rela- 
tive pages (if physical pages have been 
assigned to the relative pages). The segment’s 
entry in the Segment Table tells us how to 


Figure 3 
Bits Contents 
0-4 Map number where segment is mapped, 
or code (all 1's) indicating ‘not mapped” 
5-8 Map number of this segment's Page Table 
9-15 Size of this segment's Page Table 
(= number of pages in the segment) 
16-31 Word address (to be used together with 


the map number in bits 5-8) of this 
segment's Page Table 
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Pat ie Page 
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map size address 


find it. The format of a Segment Table entry 
is illustrated in Figure 3. 

Each segment of length greater than zero 
has a Page Table. A Page Table consists of an 
array of words, one for each page in the 
absolute segment. Each entry in the Page Table 
contains the information that must appear 
in the corresponding register of a map when 
the segment is mapped. Thus, a segment 
is mapped by moving the contents of its Page 
Table to a map and setting any map registers 
not used to indicate the page is not present. 

The memory in which the Page Tables are 
kept must, of course, be continuously mapped. 
In principle any map could be used, but 
the conventional assignments lead to the Page 
Tables being stored in segments mapped 
by maps 6 through 13. GUARDIAN assures 
that the memory containing the Page Tables 
is always accessible. 
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Figure 3 
Segment Table entry. 
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Page Table entry. 


A Page Table entry contains, in addition to 
the 13-bit physical page number, three bits 
used by the hardware and the Memory Manager 
to implement virtual storage. (The maps 
serve a dual role. They not only supply the 
upper 13 bits of a physical memory address, 
but also assist in virtual memory bookkeeping, 
keeping a record of whether a page is pres- 
ent or absent and whether it has been read from 
or written to by the IPU.) The format of a 
Page Table entry is illustrated in Figure 4. 

To summarize the steps to this point, 
from an unmapped segment, the microcode 
examines the Segment Table and locates 
the Page Table for the segment. It has the 
physical page number of the page it needs, 
but has not selected a map in which to load it. 

The allocation and control of maps 0 
through 13 have been left to SYSGEN and 
GUARDIAN. A single map, map 15, is 
reserved to the IPU microcode for the exclu- 
sive purpose of giving access to pages in 
unmapped segments. 

It would have been possible to respond to 
any reference to an unmapped absolute 
segment by mapping the entire segment in map 
15 and then proceeding with the memory 
access in the normal way. Instead, however, 
the microcode maps only a single page at 
atime. This strategy speeds up this type of 
memory access and permits individual 
pages from different absolute segments to be 
mapped simultaneously. The technique used 
results in map 15 being used as a Map Cache. 


Figure 4 
Bits Contents 
0-12 Physical page number 
13 Referenced bit 
14 Dirty bit (written into) 
15 Absent bit 
0 12 13 #14 «+15 


One of the problems which must be solved 
in any cache management scheme is keep- 
ing track of the contents of the cache. In the 
case of the Map Cache, we must know the 
absolute segment number and the relative page 
number within the segment for each page 
that is mapped. The solution adopted for the 
Tandem NonStop II system involves using 
only half of map 15 for actually mapping mem- 
ory and the remainder as memory space 
to keep track of the contents of the “working” 
half. Thus, in map 15, registers 0 through 
31 contain normal map entries (Page Table 
entries) while registers 32 through 63 con- 
tain “tags”; register 32 containing the tag for 
register 0, register 33 containing the tag for 
register 1, etc. 

Tags must be designed so that they fit within 
a map register (16 bits). Since an absolute 
segment number requires 13 bits and a relative 
page number requires 6, a map register can- 
not contain a number large enough to uniquely 
identify a given page of logical memory. 
Instead a kind of “half-map” register assign- 
ment is used in which the Page Table entry 
is placed into its corresponding map register, 
modulo 32. That is, map 15, register 0 maps 
relative pages 0 or 32, register 1 maps relative 
pages 1 or 33, and so on. Then the tag in 
the corresponding register in the upper half 
need only contain the absolute segment 
number plus one bit to indicate whether the 
relative page number actually mapped is 
less than 32, or 32 and up. 

This scheme has two advantages: 


_ The first word of an absolute extended 
address contains exactly the right informa- 
tion needed for the tag. 


_ 


2. No searching is needed to determine whether 
a page is mapped. Bits 16 through 20 of 
an absolute extended address can be used 
as an index to select both the tag word 
for checking, and the map entry for the 
corresponding memory access. If a page 
is mapped, it will be mapped only in that one 
register in the lower half of map 15, and 
its tag will be in the corresponding register 
in the upper half of map 15. 


The tag actually used in the Map Cache is 
identical to the first word of the absolute 
extended address except that bit 0 contains 0. 
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The technique of indexing into the two 
halves of the Map Cache is very important to 
the speed of operation of the IPU. In the 
actual implementation of extended addressing 
the first word of an absolute extended address 
(with bit 0 cleared) is compared to the 
appropriate tag word in the Map Cache before 
checking the Segment Table to see if the 
whole segment is mapped. If the page is not 
in the Map Cache, then the Segment Table 
is checked to see whether it is mapped in one 
of the other maps. Since a map register 
can be accessed (for comparison) much faster 
than a word in memory, the fastest access 
to memory, using an extended address, is to 
a page already mapped in the Map Cache. 

A summary of the steps required to translate 
an absolute extended address to a mapped 
address is given below. 


1. Use bits 16 through 20 of the extended 
address as an index into map 15. Call the 
index i. 


2. Compare map 15, register i+32 to the 
first word of the extended address (with bit 
0 cleared). 


3. If the comparison shows the values are 
equal, map 15, register i contains the cor- 
rect Page Table entry for this extended 
address. The mapped address consists of a 
map number of 15 and a word address 
taken from bits 16 through 30 of the 
extended address. (The first bit of the word 
address is 0.) 


4. If the comparison shows the values are 
not equal, perform the remaining steps. 


5. Using the segment number as an index, 
look up the correct entry in the Segment 
Table for this segment. (The table begins 
at word %70000 in map 14.) 


6. If bits 0 through 4 of the Segment Table 
entry contain a map number, this segment 
is mapped. Use this map number and 
the word address (bits 15 through 30) from 
the extended address to provide the 
mapped address. 


7. If bits 0 through 4 of the Segment Table 
entry contain all ones (%37), this segment 
is not mapped. Perform the remaining steps. 


8. Read the Page Table entry from memory. 
(Use the relative page number from the 
extended address as an index into the seg- 
ment’s Page Table. The mapped address 
of the Page Table consists of the Page Table 
map and the Page Table address from 
the Segment Table entry. See Figure 3.) 


9. If map 15, register i+32, contains all I's, 
the position in the Map Cache is free. Place 
the Page Table entry in map 15, register 
i, and the correct tag in map 15, register 
i+32. The page is now mapped. Complete 
the memory access as in step three. 


10. If map 15, register 
i+32, contains 
a valid tag for 
some other page, 
locate the Page 
Table for that segment and store the 
contents of map 15, register i, into the cor- 
rect entry. Then the position in Map 
Cache is free; proceed as in step nine. 
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The fastest access to memory, using 

an extended address, is to a page 
already mapped in the Map Cache. 


Converting a 16-bit Address to a 

Mapped Address 

For 16-bit logical addresses, selecting the 
correct map is simple. The first six maps have 
been assigned specific functions correspond- 
ing to the memory segments normally used 
by a process. These assignments are: 


Map Segment 


user data 

system data 

user code 

system code 

user library 

system code extension 


It is a responsibility of GUARDIAN to 
assure that whenever a process is allowed to 
run, these six maps contain the correct 
physical page numbers for that process. In 
practice all processes share the code and 
data mapped by maps 1, 3, and 5. The contents 
of maps 0, 2, and 4 may have to be changed 
when a process becomes active. Which maps 
are used is determined by the values of the 
CS, DS, LS, and PRIV bits of the Environment 
Register at the time an instruction is exe- 
cuted. (See the Nonstop II System Description 
Manual.) 
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Translating a Relative Extended Address 
to a Mapped Address 


R elative extended addresses in relative 
segments 0 through 3 are converted to 
mapped memory addresses as follows: 


® Bits 15 through 30 are used as a word address. 


= A map is selected according to the table 
shown in Figure 5. 


If the relative segment number is four or 
above, the IPU converts the relative extended 
address to an absolute extended address by 
adding a 32-bit base value to it. The resulting 
absolute extended address is then translated 
to a mapped address as described previously. 

The base value is a 32-bit quantity kept 
in map 14, registers 60 and 61. Since a full 32- 
bit addition is performed, the relocation 
base can be at any byte address in any abso- 
lute segment. For the typical extended data 
segment the relocation base is set to the 
beginning of an absolute segment (the first in 
the extended segment). 

Before adding the base, the IPU performs 
a bounds check on a relative extended address 
to detect whether a program is trying to 


Figure 5 
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Map Selection Table. 


go beyond the length of the extended data 
segment currently in use. Map 14, regis- 

ters 62 and 63, contain a 32-bit number which, 
if added to a relative extended address too 
large to fit in the extended data segment, will 
produce a carry out of bit 0. If this check 
produces the carry, the IPU invokes the 
instruction failure interrupt handler. 


Access to Single Bytes 

Both the physical memory and the memory 
maps use word addresses. When the IPU 
executes an instruction that reads a single byte 
from memory, it must read the whole word 
from memory and select the correct byte from 
the word. When it executes an instruction 
that writes a single byte into memory, it must 
first read the whole word, modify the byte 
within the word, and then write the whole word 
back into memory. These functions are 
performed in the IPU microcode. 


Maps 6 through 13 
SYSGEN and GUARDIAN are free to use 
maps 6 through 13 for whatever purposes will 
best serve the needs of system operation. 
The IPU contains no built-in assumptions about 
their assignment or use. 

In practice they are used to map memory 
which must be kept continuously mapped 
for long periods of time. Primarily this memory 
serves two purposes: 


1. It is used to store Page Tables for absolute 
segments. (Recall that the Page Tables 
must be readily accessible for the Map 
Cache to work properly.) 


2. It is used for I/O buffers. (The I/O channel 
employs mapped addresses for its memory 
accesses. Rather than use the limited 
resources of system data space, GUARD- 
IAN uses maps 6 through 13 to map the 
buffers.) 
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